The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

نویسندگان

Khalid Jaber

Nur'Aini Abdul Rashid

Rosni Abdullah

چکیده

Problem statement: Protein sequence clustering is a method used to discover relations between proteins. This method groups the proteins based on their common features. It is a core process in protein sequence classification. Graph theory has been used in protein sequence clustering as a means of partitioning the data into groups, where each group constitutes a cluster. Mohseni-Zadeh introduced a maximal cliques algorithm for protein clustering. Approach: In this study we adapted the maximal cliques algorithm of Mohseni-Zadeh to find cliques in protein sequences and we then parallelized the algorithm to improve computation times and allowed large protein databases to be processed. We used the N-Gram Hirschberg approach proposed by Abdul Rashid to calculate the distance between protein sequences. The task farming parallel program model was used to parallelize the enhanced cliques algorithm. Results: Our parallel maximal cliques algorithm was implemented on the stealth cluster using the C programming language and a hybrid approach that includes both the Message Passing Interface (MPI) library and POSIX threads (PThread) to accelerate protein sequence clustering. Conclusion: Our results showed a good speedup over sequential algorithms for cliques in protein sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large Scale Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clusters

Taxonomic clustering of species from millions of DNA fragments sequenced from their genomes is an important and frequently arising problem in metagenomics. High-throughput next generation sequencing is enabling the creation of large metagenomic samples, while at the same time making the clustering problem harder due to the short sequence length supported and sampling of hitherto unknown species...

متن کامل

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Data-driven construction of predictive models for biological systems faces challenges from data intensity, uncertainty, and computational complexity. Data-driven model inference is often considered a combinatorial graph problem where an enumeration of all feasible models is sought. The data-intensive and the NP -hard nature of such problems, however, challenges existing methods to meet the requ...

متن کامل

Finding All Maximal Cliques in Dynamic Graphs

Clustering applications dealing with perception based or biased data lead to models with non-disjunct clusters. There, objects to be clustered are allowed to belong to several clusters at the same time which results in a fuzzy clustering. It can be shown that this is equivalent to searching all maximal cliques in dynamic graphs like Gt = (V,Et), where Et−1 ⊂ Et, t = 1, . . . , T ;E0 = φ. In thi...

متن کامل

زمانبندی دو معیاره در محیط جریان کاری ترکیبی با ماشینهای غیر یکسان

This study considers scheduling in Hybrid flow shop environment with unrelated parallel machines for minimizing mean of job's tardiness and mean of job's completion times. This problem does not study in the literature, so far. Flexible flow shop environment is applicable in various industries such as wire and spring manufacturing, electronic industries and production lines. After modeling the p...

متن کامل

CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

The Parallel Maximal Cliques Algorithm for Protein Sequence Clustering

نویسندگان

چکیده

منابع مشابه

Large Scale Metagenomic Sequence Clustering via Sketching and Maximal Quasi-clique Enumeration on Map-Reduce Clusters

Coupling graph perturbation theory with scalable parallel algorithms for large-scale enumeration of maximal cliques in biological graphs

Finding All Maximal Cliques in Dynamic Graphs

زمانبندی دو معیاره در محیط جریان کاری ترکیبی با ماشینهای غیر یکسان

CLIMP: Clustering Motifs via Maximal Cliques with Parallel Computing Design

عنوان ژورنال:

اشتراک گذاری